Full Analysis of Philippine Food prices from 2007 - 2025
Aims:
Compare average prices across different regions for the same food item and year.
Examine how prices of a specific food item have changed over time within a region.
Identify regional price variations for different food items.
Predict future food prices based on historical data.
Code
from itables import init_notebook_modeimport pandas as pdimport numpy as npfrom scipy.stats import shapirofrom scipy.stats import kruskalimport seaborn as snsimport statsmodels.api as smimport matplotlib.pyplot as pltimport scipy.stats as statsfrom scipy.stats import levenefrom ipywidgets import interactimport dashfrom dash import dcc, html, Input, Outputimport plotly.express as pximport plotly.graph_objects as goimport scikit_posthocs as spimport ipywidgets as widgetsfrom IPython.display import display, clear_outputfrom pmdarima import auto_arimaimport warningsimport plotly.io as piofrom plotly.subplots import make_subplots
Code
init_notebook_mode(all_interactive=True)#open the file that was already cleaned, analyzed, and saved.file_path ="../Data/Processed/eda_results.csv"df = pd.read_csv(file_path)
Inspect the data
Code
# View the first few rows of the datasetdf.head()df
Region
Province
Food_Items
year
mean
median
Mode
Variance
Standard Deviation
IQR
Loading ITables v2.2.4 from the init_notebook_mode cell...
(need help?)
Code
# View the columns and data types of the datasetdf.info()
# View the summary statistics of the dataset though this may not make sense as they are already described in the EDA reportdf.describe()
year
mean
median
Mode
Variance
Standard Deviation
IQR
Loading ITables v2.2.4 from the init_notebook_mode cell...
(need help?)
Code
#check the shape of the datasetdf.shape
(91200, 10)
Code
# get the first and last readings for each food itemprint(df['year'].agg(['min', 'max']))
min 2007
max 2025
Name: year, dtype: int64
For comparing average prices across regions for the same food item and year, the closing price is generally the best price to use. Here’s why:
Consistency: The closing price is considered the most representative of the market’s consensus for that period. It accounts for the entire trading session and reflects both supply and demand dynamics over time.
Standard Usage: The closing price is the most widely used price in financial markets, meaning data and analysis are typically focused around this price.
Simplicity: It eliminates the noise created by intra-day fluctuations, focusing on the price at the end of the trading session, which is more relevant for long-term comparisons.
Code
# Filter the dataframe to keep only rows where 'food_item' starts with 'c_'df_filtered = df[df['Food_Items'].str.startswith('c_')]df_filtered
Region
Province
Food_Items
year
mean
median
Mode
Variance
Standard Deviation
IQR
Loading ITables v2.2.4 from the init_notebook_mode cell...
(need help?)
Loading ITables v2.2.4 from the init_notebook_mode cell...
(need help?)
Code
# Create figurefig = go.Figure()# Add traces for each food item and statfood_items = df_filtered['Food_Items'].unique()stats = ['mean', 'median']for food in food_items:for stat in stats: df_food = df_filtered[df_filtered['Food_Items'] == food]for region in df_food['Region'].unique(): df_region = df_food[df_food['Region'] == region] fig.add_trace(go.Scatter( x=df_region['year'], y=df_region[stat], mode='lines+markers', name=f"{food} - {region} ({stat})", visible=False ))# Set initial visible traces (first food item, 'mean' stat)initial_food = food_items[0]initial_stat ='mean'for trace in fig.data:if initial_food in trace.name and initial_stat in trace.name: trace.visible =True# Create separate dropdown buttons for food items and statisticsfood_buttons = []stat_buttons = []for food in food_items: visibility = [food in trace.name and initial_stat in trace.name for trace in fig.data] food_buttons.append({'label': food,'method': 'update','args': [{'visible': visibility}, {'title': f"{initial_stat.capitalize()} for {food} by Region and Year"}] })for stat in stats: visibility = [initial_food in trace.name and stat in trace.name for trace in fig.data] stat_buttons.append({'label': stat,'method': 'update','args': [{'visible': visibility}, {'yaxis': {'title': stat.capitalize()}}, {'title': f"{stat.capitalize()} for {initial_food} by Region and Year"}] })# Update layout with separate dropdown menusfig.update_layout( title=f"{initial_stat.capitalize()} for {initial_food} by Region and Year", xaxis_title="Year", yaxis_title=initial_stat.capitalize(), template="plotly_white", updatemenus=[ {'buttons': food_buttons,'direction': 'down','showactive': True,'x': 0,'xanchor': 'left','y': 1.15,'yanchor': 'top', }, {'buttons': stat_buttons,'direction': 'down','showactive': True,'x': 0.26,'xanchor': 'left','y': 1.15,'yanchor': 'top', } ])# Show figurefig.show()
The graph shows the mean (average) price or median price of different food items over time for different regions in the Philippines. #### Key Observation 1. Trend - Generally, the price of all food items has been increasing over the years across most regions. There are some fluctuations and periods of stability for some items, but the overall trend is upward. 2. Regional Differences - There is significant variation in price of different food items across regions. Some regions consistently have higher prices than others. Price fluctuations also varies between regions. Some regions experience more volatile prices than others.
Factors like weather patterns, government policies, agriculture changes, food transportation and global market trends can influence these prices.
To know which regions has the highest and lowest price every year for every food item, max, min will be calculated. Then range is next to know how wide the difference between the highest and lowest price.
Code
file_path ="../Data/Interim/cleaned_food_prices.csv"df_range = pd.read_csv(file_path)#dropping columns related to food price indexdf_nofpi_range = df_range.drop(columns=['o_food_price_index', 'h_food_price_index', 'l_food_price_index', 'c_food_price_index', 'inflation_food_price_index', 'trust_food_price_index'])# Convert 'Date' column to datetime formatdf_nofpi_range['Date'] = pd.to_datetime(df_nofpi_range['Date'])#dropping columns related to inflationdf_noinf_range = df_nofpi_range.drop(columns=['inflation_beans','inflation_cabbage', 'inflation_carrots', 'inflation_eggs', 'inflation_garlic', 'inflation_meat_beef_chops', 'inflation_meat_chicken_whole', 'inflation_meat_pork', 'inflation_onions', 'inflation_potatoes', 'inflation_rice', 'inflation_tomatoes'])#dropping columns related to trust scoresdf_cleaned_range = df_noinf_range.drop(columns=['trust_beans','trust_cabbage', 'trust_carrots', 'trust_eggs', 'trust_garlic', 'trust_meat_beef_chops', 'trust_meat_chicken_whole', 'trust_meat_pork', 'trust_onions', 'trust_potatoes', 'trust_rice', 'trust_tomatoes'])#dropping uneeded columnsdf_unneeded_range = df_cleaned_range.drop(columns=['country', 'City', 'lat', 'lon', 'Province', 'year', 'month'])# Reshaping from wide to long format (including year and month as part of the identifier)df_range = df_unneeded_range.melt(id_vars=['Region', 'Date'], var_name='Food_Items', value_name='Price')df_range_filtered = df_range[df_range['Food_Items'].str.startswith('c_')]df_range_filtered.loc[:, 'Food_Items'] = df_range_filtered['Food_Items'].str.replace('c_', '', regex=True)price_stats = df_range_filtered.groupby(['Date', 'Food_Items'])['Price'].agg(['min', 'max', 'mean']).reset_index()price_stats
Date
Food_Items
min
max
mean
Loading ITables v2.2.4 from the init_notebook_mode cell...
(need help?)
Code
def range_chart(price_stats):# Convert Date to datetime if it isn't already price_stats['Date'] = pd.to_datetime(price_stats['Date'])# Get unique food items for our dropdown food_items =sorted(price_stats['Food_Items'].unique())# Initialize our figure fig = go.Figure()# Create a list to store our dropdown menu options dropdown_buttons = []# For each food item, we'll create a filled range plotfor i, food_item inenumerate(food_items):# Filter data for this specific food item item_data = price_stats[price_stats['Food_Items'] == food_item]# Create a filled range plot fig.add_trace( go.Scatter( x=item_data['Date'], y=item_data['max'], name='Price Range', mode='lines', line=dict(width=0), showlegend=False, visible=(i ==0) # Only first item visible initially ) ) fig.add_trace( go.Scatter( x=item_data['Date'], y=item_data['min'], name=f'{food_item} Price Range', fill='tonexty', # Fill area between traces mode='lines', line=dict(width=0), fillcolor='rgba(68, 138, 255, 0.4)', # Semi-transparent blue visible=(i ==0) # Only first item visible initially ) )# Add a line for the mean price fig.add_trace( go.Scatter( x=item_data['Date'], y=item_data['mean'], name='Average Price', mode='lines', line=dict( color='rgb(31, 119, 180)', width=2 ), visible=(i ==0) # Only first item visible initially ) )# Create visibility list for this food item's traces# Each food item has 3 traces (upper bound, lower bound, and mean) visible_list = [False] * (len(food_items) *3) visible_list[i*3:(i*3)+3] = [True, True, True]# Add button for this food item to dropdown menu dropdown_buttons.append(dict( args=[{"visible": visible_list}, {"title": f"Price Range for {food_item} Over Time"}], label=food_item, method="update" ) )# Update layout with dropdown menu and other customizations fig.update_layout( updatemenus=[dict( buttons=dropdown_buttons, direction="down", showactive=True, x=0.1, y=1.15, xanchor="left", yanchor="top" ) ],# Add title and axis labels title=f"Price Range for {food_items[0]} Over Time", xaxis_title="Date", yaxis_title="Price", hovermode='x unified',# Improve layout appearance template="plotly_white", margin=dict(t=100), # Add top margin for dropdown# Improve legend layout legend=dict( orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1 ) )# Customize hover template to show min, max, and mean pricesfor trace in fig.data: trace.hovertemplate ='%{y:.2f}<extra></extra>'return fig# Example usage:fig = range_chart(price_stats)fig.show()
Code
price_stats['price_spread'] = price_stats['max'] - price_stats['min']def spread_chart(price_stats):# Get unique food items for dropdown food_items =sorted(price_stats['Food_Items'].unique())# Initialize figure figrange = go.Figure()# Create a list to store dropdown menu options dropdown_buttons = []for i, food_item inenumerate(food_items):# Filter data for this food item item_data = price_stats[price_stats['Food_Items'] == food_item]# Create the main line plot for price spread figrange.add_trace( go.Scatter( x=item_data['Date'], y=item_data['price_spread'], name='Price Spread', mode='lines+markers', line=dict(color='blue', width=2), marker=dict(size=6), visible=(i ==0), customdata=np.stack(( item_data['min'], item_data['max'], item_data['mean'] ), axis=-1), hovertemplate=('<b>Date</b>: %{x|%B %Y}<br>'+'<b>Price Spread</b>: ₱%{y:.2f}<br>'+'<b>Minimum Price</b>: ₱%{customdata[0]:.2f}<br>'+'<b>Maximum Price</b>: ₱%{customdata[1]:.2f}<br>'+'<b>Average Price</b>: ₱%{customdata[2]:.2f}<br>'+'<extra></extra>' ) ) )# Calculate statistics for annotations avg_spread = item_data['price_spread'].mean() max_spread = item_data['price_spread'].max() max_spread_date = item_data.loc[item_data['price_spread'].idxmax(), 'Date']# Add a dashed line for average spread figrange.add_trace( go.Scatter( x=[item_data['Date'].min(), item_data['Date'].max()], y=[avg_spread, avg_spread], name='Average Spread', mode='lines', line=dict(color='red', width=2, dash='dash'), visible=(i ==0) ) )# Create visibility list for this food item's traces visible_list = [False] * (len(food_items) *2) visible_list[i*2:(i*2)+2] = [True, True]# Add button for this food item to dropdown menu dropdown_buttons.append(dict( args=[ {"visible": visible_list}, {"title": f"Price Spread Analysis for {food_item}<br>" } ], label=food_item, method="update" ) )# Update layout figrange.update_layout( updatemenus=[dict( buttons=dropdown_buttons, direction="down", showactive=True, x=0.1, y=1.15, xanchor="left", yanchor="top" ) ], title=f"Price Spread Analysis for {food_items[0]}", xaxis_title="Date", yaxis_title="Price Spread in ₱", template="plotly_white", margin=dict(t=120, b=50), # Increased top margin for subtitle legend=dict( orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1 ),# Add annotation annotations=[dict( text="Price Spread = Maximum Price - Minimum Price", xref="paper", yref="paper", x=0, y=-0.15, showarrow=False, font=dict(size=10) ) ] )return figrangefigrange = spread_chart(price_stats)figrange.show()
Key Observations: 1. Overall Trend - Both the minimum and maximum prices show a general upward trend over the years, indicating an overall increase in the price of every food items. Some even recorded highest price increase on 2020, likely due to the COVID-19 pandemic and its impact on supply chains. 2. Price Range - The gap between the minimum and maximum prices for most items widens significantly in 2020 and 2021. This suggests that there were greater variations in prices across regions during these years.
Posssible causes: 1. COVID-19 Pandemic - The pandemic disrupted supply chains and increased demand for certain food items, which could have led to price volatility. 2. Regional Differences - Differences in the impact of the pandemic on regional economies and agricultural production could have contributed to price variations.
To truly know if the regions really have price difference over the years, hypothesis testing must be done. Please refer to .._testing.ipynb to know why Kruskall-Wallis was used here.
Code
#open the csv file that was already analyzed with kruskall wallis test.hypothesis_results ="../Data/Processed/hypothesis_testing_result.csv"hyp_df = pd.read_csv(hypothesis_results)
Code
# View the first few rows of the datasetprint(hyp_df.head())
Year Food_Item H-statistic p-value Result
0 2007 beans NaN NaN Not Significant
1 2007 c_beans 1077.368029 2.491117e-218 Significant
2 2007 c_cabbage 1105.762042 2.066749e-224 Significant
3 2007 c_carrots 1093.290901 9.693593e-222 Significant
4 2007 c_eggs 905.101619 1.726764e-181 Significant
Code
# Filter the dataframe to keep only rows where 'food_item' starts with 'c_'hyp_df_filtered = hyp_df[hyp_df['Food_Item'].str.startswith('c_')]print(hyp_df_filtered)
Loading ITables v2.2.4 from the init_notebook_mode cell...
(need help?)
Code
# Perform Dunn's Test for all food items and yearsdt_results = []for item in df_dt['Food_Items'].unique():for year in df_dt['year'].unique(): filter_dfdt = df_dt[(df_dt['Food_Items'] == item) & (df_dt['year'] == year)]ifnot filter_dfdt.empty: dunn_result = sp.posthoc_dunn(filter_dfdt, val_col='Price', group_col='Region') dunn_result['Food_Items'] = item dunn_result['year'] = year dt_results.append(dunn_result)# Combine results into a single DataFrame and save to CSVif dt_results: final_results = pd.concat(dt_results) final_results.to_csv("dunns_test_results.csv", index=True)print("Dunn's test results saved to dunns_test_results.csv")else:print("No valid comparisons found.")
Dunn's test results saved to dunns_test_results.csv
This heatmap visualizes the results of a Dunn’s post-hoc test for significant differences in prices of different food items between regions across the years. The color scale indicates the following: * Red (Close to 0): Indicates a statistically significant difference in bean prices between two regions. * Light Green to Yellow (Around 0.2 to 0.8): Suggests a possible difference, but not statistically significant based on the chosen significance level (typically 0.05) * Dark Green (Close to 0): Indicates no statistically significant difference in bean prices between the two regions.
Key Observations
Significant Differences Exist: The presence of so many red cells indicates that there are statistically significant differences in food prices in almost all regions at any given years.
No Significant Differences: patches of green indicate pairs of regions where the Dunn’s test did not find a statistically significant difference in food prices, but there are only few of them.
Market Average Comparison: The “Market Average” row/column allows you to see how each region’s prices compare to the overall average. Dark red cells in this row/column would indicate regions with prices significantly different from the market average.
Food price Prediction 3 years into the future
Code
#dropping uneeded columnsdf_unneeded_range = df_cleaned_range.drop(columns=['country', 'City', 'lat', 'lon', 'Province', 'year', 'month'])# Reshaping from wide to long format (including year and month as part of the identifier)df_range = df_unneeded_range.melt(id_vars=['Region', 'Date'], var_name='Food_Items', value_name='Price')df_range_filtered = df_range[df_range['Food_Items'].str.startswith('c_')]df_range_filtered.loc[:, 'Food_Items'] = df_range_filtered['Food_Items'].str.replace('c_', '', regex=True)# Sort by datedf = df_range_filtered.sort_values(by=['Region', 'Date'])
Code
# Suppress specific warningswarnings.filterwarnings("ignore", category=UserWarning, module="sklearn")warnings.filterwarnings("ignore", category=FutureWarning, module="sklearn")warnings.filterwarnings("ignore", category=FutureWarning, module="pmdarima")warnings.filterwarnings("ignore", category=DeprecationWarning, module="pmdarima")# Convert date column to datetimedf['Date'] = pd.to_datetime(df['Date'])# Sort by datedf = df.sort_values(by=['Region', 'Date'])# Dropdown for interactive visualizationregions = df['Region'].unique()items = df['Food_Items'].unique()region_dropdown = widgets.Dropdown(options=regions, description='Region:')item_dropdown = widgets.Dropdown(options=items, description='Food Item:')def preprocess_data(region, item): region_df = df[(df['Region'] == region) & (df['Food_Items'] == item)] region_df = ( region_df.groupby('Date', as_index=False)['Price'] .mean() # Aggregate duplicate dates by averaging ) region_df = region_df.set_index('Date').asfreq('MS') # Ensure monthly frequency region_df['Price'] = region_df['Price'].interpolate() # Fill missing valuesreturn region_dfdef update_graph(region, item): region_df = preprocess_data(region, item)iflen(region_df) <12:print(f"Not enough data for {region} - {item} to forecast.")return# Use AutoARIMA to find the best model model = auto_arima(region_df['Price'], seasonal=True, m=12, stepwise=True, trace=True, max_order=None, stationary=False)# Forecast the next 36 months forecast = model.predict(n_periods=36)# Plot the historical data and forecast plt.figure(figsize=(12, 6)) plt.plot(region_df.index, region_df['Price'], label='Historical Prices') plt.plot(pd.date_range(region_df.index[-1], periods=37, freq='MS')[1:], forecast, label='Forecast', linestyle='dashed', color='red') plt.xlabel('Date') plt.ylabel('Price') plt.legend() plt.title(f'Price Forecast for {region} - {item} (Next 36 Months)') plt.show()interactive_plot = widgets.interactive(update_graph, region=region_dropdown, item=item_dropdown)display(interactive_plot)
Key Observations
Historical Price Volatility: The blue line representing historical prices shows significant fluctuations, especially noticeable spikes around 2008 and smaller ones in other years. This indicates that most food prices have been subject to considerable volatility over the past two decades.
Recent Price Increase: There’s a clear upward trend in food prices in the period leading up to the forecast (roughly from 2020 to 2024). This suggests recent factors have been pushing prices higher.
Relatively Stable Forecast: The red dashed line, representing the forecast for the next 36 months, shows a relatively stable and slightly upward trend. The model predicts a continued but moderated increase compared to the recent past.
Key Observations
Historical Volatility:
Weather Patterns:adverse weather events in key agricultural regions can significantly impact supply and cause price volatility.
Changes in Demand: Fluctuations in global demand, driven by population growth or economic factors, can also contribute to price swings.
Recent Price Increase:
Inflation: General inflation in the economy could be pushing up agricultural commodity prices.
Supply Chain Issues: Disruptions to global supply chains (e.g., due to pandemics, conflicts, or trade policies) can lead to shortages and price increases.
Increased Input Costs: Rising costs of fertilizers, fuel, and labor can increase the cost of food production, which could be passed on to consumers.
Relatively Stable Forecast:
Model Assumption: ARIMA models assume that, to some extent, future trends will resemble past patterns. The stable forecast suggests the model has not picked up any strong upward or downward pressures for the next 36 months beyond what is already captured in the recent trend.